46 research outputs found

    Improved indel detection in DNA and RNA via realignment with ABRA2

    Get PDF
    Motivation: Genomic variant detection from next-generation sequencing has become established as an extremely important component of research and clinical diagnoses in both cancer and Mendelian disorders. Insertions and deletions (indels) are a common source of variation and can frequently impact functionality, thus making their detection vitally important. While substantial effort has gone into detecting indels from DNA, there is still opportunity for improvement. Further, detection of indels from RNA-Seq data has largely been an afterthought and offers another critical area for variant detection. Results: We present here ABRA2, a redesign of the original ABRA implementation that offers support for realignment of both RNA and DNA short reads. The process results in improved accuracy and scalability including support for human whole genomes. Results demonstrate substantial improvement in indel detection for a variety of data types, including those that were not previously supported by ABRA. Further, ABRA2 results in broad improvements to variant calling accuracy across a wide range of post-processing workflows including whole genomes, targeted exomes and transcriptome sequencing

    Epstein-Barr Virus-Positive Cancers Show Altered B-Cell Clonality

    Get PDF
    Epstein-Barr virus (EBV) is convincingly associated with gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. To test the hypothesis that there are additional cancer types with high prevalence of EBV, we determined EBV viral expression in all the Cancer Genome Atlas Project (TCGA) mRNA sequencing (mRNA-seq) samples (n 10,396) from 32 different tumor types. We found that EBV was present in gastric adenocarcinoma and lymphoma, as expected, and was also present in 5% of samples in 10 additional tumor types. For most samples, EBV transcript levels were low, which suggests that EBV was likely present due to infected infiltrating B cells. In order to determine if there was a difference in the B-cell populations, we assembled B-cell receptors for each sample and found B-cell receptor abundance (P 1.4 1020) and diversity (P 8.3 1027) were significantly higher in EBV-positive samples. Moreover, diversity was independent of B-cell abundance, suggesting that the presence of EBV was associated with an increased and altered B-cell population. IMPORTANCE Around 20% of human cancers are associated with viruses. Epstein-Barr virus (EBV) contributes to gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. We assessed the prevalence of EBV in RNA-seq from 32 tumor types in the Cancer Genome Atlas Project (TCGA) and found EBV to be present in 5% of samples in 12 tumor types. EBV infects epithelial cells and B cells and in B cells causes proliferation. We hypothesized that the low expression of EBV in most of the tumor types was due to infiltration of B cells into the tumor. The increase in B-cell abundance and diversity in subjects where EBV was detected in the tumors strengthens this hypothesis. Overall, we found that EBV was associated with an increased and altered immune response. This result is not evidence of causality, but a potential novel biomarker for tumor immune status

    Assembly-based inference of B-cell receptor repertoires from short read RNA sequencing data with V'DJer

    Get PDF
    Motivation: B-cell receptor (BCR) repertoire profiling is an important tool for understanding the biology of diverse immunologic processes. Current methods for analyzing adaptive immune receptor repertoires depend upon PCR amplification of VDJ rearrangements followed by long read amplicon sequencing spanning the VDJ junctions. While this approach has proven to be effective, it is frequently not feasible due to cost or limited sample material. Additionally, there are many existing datasets where short-read RNA sequencing data are available but PCR amplified BCR data are not. Results: We present here V'DJer, an assembly-based method that reconstructs adaptive immune receptor repertoires from short-read RNA sequencing data. This method captures expressed BCR loci from a standard RNA-seq assay. We applied this method to 473 Melanoma samples from The Cancer Genome Atlas and demonstrate V'DJer's ability to accurately reconstruct BCR repertoires from short read mRNA-seq data

    Prognostic value of B cells in cutaneous melanoma

    Get PDF
    Background: Measures of the adaptive immune response have prognostic and predictive associations in melanoma and other cancer types. Specifically, intratumoral T cell density and function have considerable prognostic and predictive value in skin cutaneous melanoma (SKCM). Less is known about the significance of tumor-infiltrating B cells in SKCM. Our goal was to understand the prognostic and predictive value of B cell phenotypic subsets in SKCM using RNA sequencing. Methods: We used our previously published algorithm, V'DJer, to assemble B cell receptor (BCR) repertoires and estimate diversity from short-read RNA sequencing (RNA-seq). We applied machine learning-based cellular phenotype classifiers to measure relative similarity of bulk tumor sample gene expression profiles and different B cell phenotypes. We assessed these aspects of B cell biology in 473 SKCM from the Cancer Genome Atlas Project (TCGA) as well as in RNA-seq data corresponding to tumor samples procured from patients who received CTLA-4 and PD-1 inhibitors for metastatic SKCM. Results: We found that the BCR repertoire was associated with different clinical factors, such as tumor tissue site and sex. However, increased clonality of the BCR repertoire was favorably prognostic in SKCM and was prognostic even after first conditioning on various clinical factors. Mutation burden was not correlated with any BCR measurement, and no specific mutation had an altered BCR repertoire. Lack of an assembled BCR in pre-treatment tumor tissues was associated with a lack of anti-tumor response to a CTLA-4 inhibitor in metastatic SKCM. Conclusions: These findings suggest an important prognostic and predictive role for B cell characteristics in SKCM. This has implications for melanoma immunobiology and potential development of immunogenomics features to predict survival and response to immunotherapy

    Virus expression detection reveals RNA-sequencing contamination in TCGA

    Get PDF
    Background: Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in several of The Cancer Genome Atlas (TCGA) RNA-sequencing samples. This work motivated us to assess a greater number of samples and determine the origin of possible contaminations using viral sequences. To detect viruses with high specificity, we developed the publicly available workflow, VirDetect, that detects virus and laboratory vector sequences in RNA-seq samples. We applied VirDetect to 9143 RNA-seq samples sequenced at one TCGA sequencing center (28/33 cancer types) over 5 years. Results: We confirmed that H-HPV18 was present in many samples and determined that viral transcripts from H-HPV18 significantly co-occurred with those from xenotropic mouse leukemia virus-related virus (XMRV). Using laboratory metadata and viral transcription, we determined that the likely contaminant was a pool of cell lines known as the "common reference", which was sequenced alongside TCGA RNA-seq samples as a control to monitor quality across technology transitions (i.e. microarray to GAII to HiSeq), and to link RNA-seq to previous generation microarrays that standardly used the "common reference". One of the cell lines in the pool was a laboratory isolate of MCF-7, which we discovered was infected with XMRV; another constituent of the pool was likely HeLa cells. Conclusions: Altogether, this indicates a multi-step contamination process. First, MCF-7 was infected with an XMRV. Second, this infected cell line was added to a pool of cell lines, which contained HeLa. Finally, RNA from this pool of cell lines contaminated several TCGA tumor samples most-likely during library construction. Thus, these human tumors with H-HPV or XMRV reads were likely not infected with H-HPV 18 or XMRV

    Genetic determinants of cellular addiction to DNA polymerase theta

    Get PDF
    Polymerase theta (Pol θ, gene name Polq) is a widely conserved DNA polymerase that mediates a microhomology-mediated, error-prone, double strand break (DSB) repair pathway, referred to as Theta Mediated End Joining (TMEJ). Cells with homologous recombination deficiency are reliant on TMEJ for DSB repair. It is unknown whether deficiencies in other components of the DNA damage response (DDR) also result in Pol θ addiction. Here we use a CRISPR genetic screen to uncover 140 Polq synthetic lethal (PolqSL) genes, the majority of which were previously unknown. Functional analyses indicate that Pol θ/TMEJ addiction is associated with increased levels of replication-associated DSBs, regardless of the initial source of damage. We further demonstrate that approximately 30% of TCGA breast cancers have genetic alterations in PolqSL genes and exhibit genomic scars of Pol θ/TMEJ hyperactivity, thereby substantially expanding the subset of human cancers for which Pol θ inhibition represents a promising therapeutic strategy

    A P53-Independent DNA Damage Response Suppresses Oncogenic Proliferation and Genome Instability

    Get PDF
    The Mre11-Rad50-Nbs1 complex is a DNA double-strand break sensor that mediates a tumor-suppressive DNA damage response (DDR) in cells undergoing oncogenic stress, yet the mechanisms underlying this effect are poorly understood. Using a genetically inducible primary mammary epithelial cell model, we demonstrate that Mre11 suppresses proliferation and DNA damage induced by diverse oncogenic drivers through a p53-independent mechanism. Breast tumorigenesis models engineered to express a hypomorphic Mre11 allele exhibit increased levels of oncogene-induced DNA damage, R-loop accumulation, and chromosomal instability with a characteristic copy number loss phenotype. Mre11 complex dysfunction is identified in a subset of human triple-negative breast cancers and is associated with increased sensitivity to DNA-damaging therapy and inhibitors of ataxia telangiectasia and Rad3 related (ATR) and poly (ADP-ribose) polymerase (PARP). Thus, deficiencies in the Mre11-dependent DDR drive proliferation and genome instability patterns in p53-deficient breast cancers and represent an opportunity for therapeutic exploitation

    An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics

    Get PDF
    For a decade, The Cancer Genome Atlas (TCGA) program collected clinicopathologic annotation data along with multi-platform molecular profiles of more than 11,000 human tumors across 33 different cancer types. TCGA clinical data contain key features representing the democratized nature of the data collection process. To ensure proper use of this large clinical dataset associated with genomic features, we developed a standardized dataset named the TCGA Pan-Cancer Clinical Data Resource (TCGA-CDR), which includes four major clinical outcome endpoints. In addition to detailing major challenges and statistical limitations encountered during the effort of integrating the acquired clinical data, we present a summary that includes endpoint usage recommendations for each cancer type. These TCGA-CDR findings appear to be consistent with cancer genomics studies independent of the TCGA effort and provide opportunities for investigating cancer biology using clinical correlates at an unprecedented scale. Analysis of clinicopathologic annotations for over 11,000 cancer patients in the TCGA program leads to the generation of TCGA Clinical Data Resource, which provides recommendations of clinical outcome endpoint usage for 33 cancer types

    Driver Fusions and Their Implications in the Development and Treatment of Human Cancers.

    Get PDF
    Gene fusions represent an important class of somatic alterations in cancer. We systematically investigated fusions in 9,624 tumors across 33 cancer types using multiple fusion calling tools. We identified a total of 25,664 fusions, with a 63% validation rate. Integration of gene expression, copy number, and fusion annotation data revealed that fusions involving oncogenes tend to exhibit increased expression, whereas fusions involving tumor suppressors have the opposite effect. For fusions involving kinases, we found 1,275 with an intact kinase domain, the proportion of which varied significantly across cancer types. Our study suggests that fusions drive the development of 16.5% of cancer cases and function as the sole driver in more than 1% of them. Finally, we identified druggable fusions involving genes such as TMPRSS2, RET, FGFR3, ALK, and ESR1 in 6.0% of cases, and we predicted immunogenic peptides, suggesting that fusions may provide leads for targeted drug and immune therapy

    The immune landscape of cancer

    Get PDF
    We performed an extensive immunogenomic analysis of more than 10,000 tumors comprising 33 diverse cancer types by utilizing data compiled by TCGA. Across cancer types, we identified six immune subtypes—wound healing, IFN-γ dominant, inflammatory, lymphocyte depleted, immunologically quiet, and TGF-β dominant—characterized by differences in macrophage or lymphocyte signatures, Th1:Th2 cell ratio, extent of intratumoral heterogeneity, aneuploidy, extent of neoantigen load, overall cell proliferation, expression of immunomodulatory genes, and prognosis. Specific driver mutations correlated with lower (CTNNB1, NRAS, or IDH1) or higher (BRAF, TP53, or CASP8) leukocyte levels across all cancers. Multiple control modalities of the intracellular and extracellular networks (transcription, microRNAs, copy number, and epigenetic processes) were involved in tumor-immune cell interactions, both across and within immune subtypes. Our immunogenomics pipeline to characterize these heterogeneous tumors and the resulting data are intended to serve as a resource for future targeted studies to further advance the field
    corecore